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1.  Summary  of  Research  Accomplishments 

Virtually  all  of  the  important  research  results  obtained  during  execution  of 
this  grant  were  reported  in  journal  articles  and  conference  papers,  and  these  are 
listed  in  Section  3.  We  give  here  brief  summaries  of  the  major  contributions; 
reference  numbers  are  to  Section  3. 

A.  Digital  Signal  Processing  Algorithms:  VLSI  Implementation  and 
Complexity  Issues  [1-20] 

The  development  of  very  dense  chips  (Very  Large  Scale  Integrated  Circuits, 
or  VLSI)  has  great  potential  value  for  digital  signal  processing,  not  just  because 
standard  signal  processing  elements  can  be  made  smaller  and  cheaper,  but  also 
because  it  allows  computations  to  be  spread  out  geometrically  in  such  a  way  that 
many  computations  are  done  simultaneously.  In  a  typical  signal  processing  appli¬ 
cation,  such  as  filtering  a  signal  to  remove  noise,  for  example,  the  signal  becomes 
available  in  a  linear  stream,  and  can  flow  through  the  chip  continuously.  We  can 
then  make  maximum  use  of  the  idea  of  pipelining.  That  is,  when  the  rear  part  of 
the  chip  is  operating  on  a  given  part  of  the  signal,  the  front  part  of  the  chip  can 
operate  on  the  next  part  of  the  signal.  If  the  chip  is  designed  specially  for  a  given 
task,  it  is  possible  that  every  part  of  the  chip  is  doing  useful  work  all  the  time; 
the  chip  is  "completely  pipelined.” 

In  a  general  purpose  computer,  in  which  an  arbitrary  set  of  instructions  is 
executed,  it  is  difficult  to  schedule  the  operations  so  that  basic  parts  of  the  com¬ 
puter  are  pipelined  to  a  great  extent.  And  even  then,  there  is  usually  only  one  or 
two  basic  blocks  for  such  fundamental  tasks  as  multiplication,  addition,  and  so 
on.  If  we  want  to  invest  effort  in  designing  a  single-purpose  dedicated  chip,  how¬ 
ever,  we  can  try  to  lay  out  the  computational  blocks  in  a  way  that  allows  com¬ 
pletely  pipelined  operation  of  many  computational  blocks.  A  general  review  of 
these  topics  was  given  in  [14]. 

In  [3,  7]  a  definition  is  given  for  a  class  of  completely-pipelined  VLSI  archi¬ 
tectures.  Two  topologies  are  then  described:  leaf-connected  and  mesh-connected 
trees.  Layouts  are  then  described  which  use  these  topologies  to  implement  multi¬ 
pliers  and  convolvers  that  make  efficient  use  of  time  and  space. 

In  [5,  6]  we  investigate  the  optimal  tradeoff  between  the  degree  of  intermedi¬ 
ate  latching  and  cost  in  special-purpose  VLSI  chips,  using  the  measure  Area  • 
Period.  The  results  show  that  significant  reductions  in  AP-product  (reciprocal  of 
throughput  per  unit  area)  can  be  achieved  by  intermediate  latching  in  many 


typical  signal  processing  applications. 

In  [ll]  completely  pipelined  inner-product  architectures  are  presented  for 
FIR  filters  and  linear  transformers.  The  designs  use  only  multipliers:  the  sum¬ 
ming  of  products  requires  no  additional  area  or  time. 

Testing  is  an  important  issue  in  the  implementation  of  high  density  custom 
chips.  In  [12,  13]  we  describe  two  sets  of  conditions  that  make  one-dimensional 
bilateral  arrays  of  combinational  cells  testable  for  single  faulty  cells.  The  test 
sequences  are  preset,  and  in  the  worst  case,  grow  quadratically  with  the  size  of 
the  array.  A  systolic  implementation  of  an  FIR  filter  is  given  as  an  example. 

Reference  [16]  addresses  the  problem  of  reducing  the  page  faulting  that 
occurs  when  large  VLSI  layout  algorithms  are  run  in  a  paged  memory  environ¬ 
ment.  Algorithms  are  presented  that  take  advantage  of  reference  locality  and 
have  good  upper  bounds  on  the  number  of  page  faults. 

In  [9,  17,  18]  a  geometric  representation  of  array  computation  is  presented. 
Well  known  systolic  designs  are  related  to  one  another  by  linear  transformations 
of  a  three-dimensional  vector  space,  where  one  of  the  dimensions  is  time.  Much 
previous  work  is  unified  in  this  way,  including  convolvers,  linear  transforms, 
matrix  product,  and  matrix  transposition.  The  approach  suggests  new  designs  for 
these  computations,  some  of  which  are  asymptotically  optimal  under  an  appropri¬ 
ate  VLSI  complexity  measure. 

In  [8,  15]  two  particular  layouts  are  described.  In  the  first  reference,  a  Dadda 
multiplier  comprised  of  parallel  counters  is  described.  We  analyze  the  complexity 
of  the  resulting  design,  and  show  that  it  is  optimal  with  respect  to  both  its  perio  ’ 
and  latency.  In  [15]  we  describe  the  design,  layout,  and  simulation  of  a  recu 
sively  defined  VLSI  chip,  using  a  constraint-based,  procedural  language  developeu 
at  Princeton.  The  chip  implements  a  regular,  recursive  structure  for  a  parallel 
counter.  Several  instantiations  of  the  design  were  fabricated  by  the  MOSIS  facil¬ 
ity  and  successfully  tested. 

References  [1,  10,  19]  deal  with  more  abstract  complexity  issues.  The  first 
paper  considers  the  complexity  of  finding  optimal  fixed-  or  variable-length  unam¬ 
biguous  address  codes  for  the  nodes  of  a  packet  radio  network.  The  second  paper 
shows  that  some  practical  problems  in  the  implementation  of  FIR  digital  filters 
are  NP-complete,  and  therefore  likely  to  be  intractable.  These  include  minimizing 
the  time,  number  of  additions,  or  number  of  registers  needed  to  implement  & 
fixed  FIR  digital  filter.  In  [19],  the  question  of  the  complexity  of  analog 
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computation  is  considered.  The  operation  of  certain  physical  devices  is  reduced  to 
a  strong  version  of  Church’s  Thesis,  the  hypothesis  that  P  7^  NP,  and  a  downhill 
principle.  From  these  arguments  we  can  draw  conclusions  about  the  efficiency  of 
analog  devices  used  as  computers,  based  on  complexity  considerations. 

Two  algorithms  for  digital  signal  processing  are  described  in  [2,  4].  In  the 
first,  we  present  a  program  for  computing  the  unwrapped  phase  of  a  signal,  using 
an  algorithm  for  factoring  very  high  degree  polynomials.  In  the  second,  we  treat 
the  problem  of  designing  lowpass  FIR  digital  filters  that  are  very  flat  at  zero  fre¬ 
quency,  smooth  in  the  passband,  and  minimax  in  the  stopband,  using  linear  pro¬ 
gramming.  The  cases  of  lowpass  and  lowpass-differentiator  are  studied  in  detail. 

Finally,  in  [20],  we  study  a  parallel  processing  architecture  for  the  solution  of 
partial  differential  equations  by  point  iteration,  using  a  circulating  memory.  If  N 
is  the  number  of  processors,  the  hardware  utilization  efficiency  remains  above 
90%  in  one-dimensional  case,  and  above  75%  in  the  two-dimensional  case,  for  up 
to  N/2  processors,  but  there  are  sharply  diminishing  returns  for  more  than  N/2 
processors. 

B.  Digital  Signal  Processing  in  an  Unsure  Noise  Environment  [21-32] 

Although  a  well-defined  theory  exists  which  specifies  optimal  systems  for  sig¬ 
nal  detection  and/or  estimation  in  noise,  this  theory  requires  an  essentially  com¬ 
plete  knowledge  of  the  statistical  properties  of  the  noise.  In  most  practical  situa¬ 
tions,  only  a  part  of  this  information  is  available  -  because  noise  statistics  change 
so  rapidly  that  a  complete  description  cannot  be  acquired.  Three  ways  to  attack 
this  problem  are:  (1)  make  decisions  rapidly  enough  so  that  noise  statistics  do 
not  change  appreciably  during  the  decision  time,  (2)  devise  nearly  optimal  deci¬ 
sion  procedures  which  allow  a  region  of  uncertainty  in  the  noise  statistics,  and  (3) 
attempt  to  find  noise  models  which  are  non-Gaussian  but  still  relatively  tractable. 

Sequential  detection  schemes  attempt  to  make  a  decision  for  "signal”  or  ”no 
signal”  after  each  observation.  Fixed-sample  detectors  process  a  fixed  number  of 
observations  before  making  a  decision.  The  latter  are  simpler  but  the  former  are 
more  efficient  in  that  they  lead  to  decisions  in  a  shorter  average  time.  We  have 
successfully  investigated  sequential  detectors  modified  so  that  their  complexity  is 
not  much  greater  than  that  of  fixed-sample  detectors  while  their  performance  is 
nearly  equal  to  that  of  sequential  systems,  [22,25,26]. 
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A  well-known  technique  to  overcome  insufficient  knowledge  of  the  corrupting 
noise  in  detection/estimation  systems  is  to  define  a  class  of  noises  which  is  certain 
to  contain  the  noise  in  question.  Then  a  detector/estimator  is  designed  which  is 
optimal  for  the  worst  noise  in  the  class.  Such  min-max  systems  tend  to  be  subop- 
timal  in  performance,  of  course,  but  tend  to  be  robust  in  the  sense  that  reason¬ 
able  deviations  in  the  noise  statistics  produce  small  degradations  in  performance. 
We  have  designed  and  evaluated  such  systems  with  particular  attention  to  requir¬ 
ing  system  simplicity  and  to  taking  account  of  the  non-independence  of  noise 
sequences  [24,29]. 

Although  classical  detection  and  estimation  theory  does  not  require  the 
Gaussian  assumption  for  noise  models,  the  implementation  of  these  schemes  does 
rely  heavily  on  the  tractability  of  multivariate  Gaussian  statistics.  We  have 
devoted  a  major  effort  to  dispensing  with  the  Gaussian  assumption  since  many  of 
the  important  current  signal  extraction  and  detection  problems  occur  in  a  non- 
Gaussian  noise  environment.  We  have  devised  models  [21,23,27,28,31,32]  for 
classes  of  such  noises  with  the  aim  of  retaining  some  of  the  tractability  of  the 
Gaussian  case  while  modelling  reasonable  non-Gaussian  noises.  We  have 
obtained  locally  optimal  detectors  for  a  wide  class  of  non-Gaussian  but  dependent 
noise  structures  [30]. 
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